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Abstract. In this paper, we present a protocol for computing the prin- 
cipal eigenvector of a collection of data matrices belonging to multiple 
semi-honest parties with privacy constraints. Our proposed protocol is 
based on secure multi-party computation with a semi-honest arbitrator 
who deals with data encrypted by the other parties using an additive ho- 
momorphic cryptosystem. We augment the protocol with randomization 
and obfuscation to make it difficult for any party to estimate properties 
of the data belonging to other parties from the intermediate steps. The 
previous approaches towards this problem were based on expensive QR 
decomposition of correlation matrices, we present an efficient algorithm 
using the power iteration method. We analyze the protocol for correct- 
ness, security, and efficiency. 

1 Introduction 

Eigenvector computation is one of the most basic tools of data analysis. In any 
multivariate dataset, the eigenvectors provide information about key trends in 
the data, as well as the relative importance of the different variables. These find 
use in a diverse set of applications, including principal component analysis [7], 
collaborative filtering |4] and PageRank [8] . Not all eigenvectors of the data are 
equally important; only those corresponding to the highest eigenvalues are used 
as representations of trends in the data. The most important eigenvector is the 
principal eigenvector corresponding to the maximum eigenvalue. 

In many scenarios, the entity that actually computes the eigenvectors is dif- 
ferent from the entities that possess the data. For instance, a data mining agency 
may desire to compute the eigenvectors of a distributed set of records, or an en- 
terprise providing recommendations may want to compute eigenvectors from the 
personal ratings of subscribers to facilitate making recommendations to new cus- 
tomers. We will refer to such entities as arbitrators. Computation of eigenvectors 
requires the knowledge of either the data from the individual parties or the cor- 
relation matrix derived from it. The parties that hold the data may however 
consider them private and be unwilling to expose any aspect of their individual 
data to either the arbitrator or to other parties, while being agreeable, in princi- 
ple, to contribute to the computation of a global trend. As a result, we require a 
privacy preserving algorithm that can compute the eigenvectors of the aggregate 
data while maintaining the necessary privacy of the individual data providers. 
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The common approach to this type of problem is to obfuscate individual data 
through controlled randomization [5|. However, since we desire our estimates to 
be exact, simple randomization methods that merely ensure accuracy in the 
mean cannot be employed. Han, et al. [6j address the problem by computing 
the complete QR decomposition [S] of privately shared data using cryptographic 
primitives. This enables all parties to collaboratively compute the complete set 
of global eigenvectors but does not truly hide the data from individual sources. 
Given the complete set of eigenvectors and eigenvalues provided by the QR 
decomposition, any party can reverse engineer the correlation matrix for the 
data from the remaining parties and compute trends among them. Canny [2] 
present a different distributed approach that does employ an arbitrator, in their 
case a blackboard, however although individual data instances are hidden, both 
the arbitrator and individual parties have access to all aggregated individual 
stages of the computation and the final result is public, which is much less 
stringent than our privacy constraints. 

In this paper, we propose a new privacy-preserving protocol for shared com- 
putation of the principal eigenvector of a distributed collection of privately held 
data. The algorithm is designed such that the individual parties, whom we will 
refer to as "Alice" and "Bob" learn nothing about each others' data, and only 
learn the degree to which their own data follow the global trend indicated by 
the principal eigenvector. The arbitrator, who we call "Trent", coordinates the 
computation but learns nothing about the data of the individual parties besides 
the principal eigenvector which he receives at the end of the computation. In 
our presentation, for simplicity, we initially consider two parties each having an 
individual data matrix. Later we show that the protocol can be naturally gen- 
eralized to N parties. As the N parties communicate only with Trent in a star 
network topology with 0{N) data transmissions, this is much more efficient than 
the 0{N'^) data transmission cost if all parties communicated with each other 
in a fully connected network. The data may be split in two possible ways: along 
data instances or features. In this work, we principally consider the data-split 
case. However, our algorithm is easily applied to feature split data as well. 

We use the power iteration method [5] to compute the principal eigenvector. 
The arbitrator Trent introduces a combination of homomorphic encryption [9] , 
randomization, and obfuscation to ensure that the computation preserves pri- 
vacy. The algorithm assumes the parties to be semi-honest. While they are as- 
sumed to follow the protocol correctly and refrain from using falsified data as 
input, they may record and analyze the intermediate results obtained while fol- 
lowing the protocol in order to to gain as much information as possible. It is 
required that no party colludes with Trent as this will compromise the privacy 
of the protocol. 

The computational requirements of the algorithm are the same as that of 
the power iteration method. In addition, each iteration requires the encryption 
and decryption of two k dimensional vectors, where k is the dimensionality of 
the data, as well as transmission of the encrypted vectors to and from Trent. 
Nevertheless, the encryption and transmission overhead, which is linear in /c, may 
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be expected to be significantly lower than the calculating the QR decomposition 
or similar methods which require repeated transmission of entire matrices. In 
general, the computational cost of the protocol is dependent on the degree of 
security we desire as required by the application. 



2 Preliminaries 

2.1 Power Iteration Method 

The power iteration method [51 is an algorithm to find the principal eigenvector 
and its associated eigenvalue for square matrices. To simplify explanation, we 
assume that the matrix is diagonalizable with real eigenvalues, although the 
algorithm is applicable to general square matrices as well |TT]. Let A he a size 
N X N matrix whose eigenvalues are Ai, . . . , X^. 

The power iteration method computes the principal eigenvector of A through 
the iteration 

Ax, I 
^ jAxJ' 

where N dimensional vector. If the principal eigenvalue is unique, the 

series a;„ = ^"xg is guaranteed to converge to a scaling of the principal eigenvec- 
tor. In the standard algorithm, £2 normalization is used to prevent the magnitude 
of the vector from overflow and underflow. Other normalization factors can also 
be used if they do not change the limit of the series. 

We assume wlog that |Ai| > ••• > |Ajv| > 0. Let Vi be the normalized 
eigenvector corresponding to A^. Since A is assumed to be diagonalizable, the 
eigenvectors . . . ,vn} create a basis for M^. For unique values of e M^, 
any vector xq € M.^ can be written as xq — X]i=i CiiJi- It can be shown that 
p^A'^xo is asymptotically equal to ciVi which forms the basis of the power 

iteration method and the convergence rate of the algorithm is ^ . The algorithm 
converges quickly when there is no eigenvalue close in magnitude to the principal 
eigenvalue. 



2.2 Homomorphic Encryption 

A homomorphic encryption algorithm allows for operations to be perform on 
the encrypted data without requiring to know the unencrypted values. If • and 
+ are two operators and x and y are two plaintext elements, a homomorphic 
encryption function E satisfies 

E[x] ■ E[y] ^ E[x + y]. 

In this work, we use the additive homomorphic Paillier asymmetric key cryp- 
tosystem [9j. 
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3 Privacy Preserving Protocol 

3.1 Data Setup and Privacy Requirements 

We formally define the problem, in which multiple parties, try to compute the 
principal eigenvector over their collectively held datasets without disclosing any 
information to each other. For simplicity, we describe the problem with two 
parties, Alice and Bob; and later show that the algorithm is easily extended to 
multiple parties. 

The parties Alice and Bob are assumed to be semi-honest whic;h means that 
the parties will follow the steps of the protocol correctly and will not try to cheat 
by passing falsified data aimed at extracting information about other parties. The 
parties are assumed to be curious in the sense that they may record the outcomes 
of all intermediate steps of the protocol to extract any possible information. 
The protocol is coordinated by the semi-honest arbitrator Trent. Alice and Bob 
communicate directly with Trent rather than each other. Trent performs all 
the intermediate computations and transfers the results to each party. Although 
Trent is trusted not to collude with other parties, it is important to note that the 
parties do not trust Trent with their data and intend to prevent him from being 
able to see it. Alice and Bob hide information by using a shared key cryptosystem 
to send only encrypted data to Trent. 

We assume that both the datasets can be represented as matrices in which 
columns and rows correspond to the data samples and the features, respectively. 
For instance, the individual email collections of Alice and Bob are represented as 
matrices A and B respectively, in which the columns correspond to the emails, 
and the rows correspond to the words. The entries of these matrices represent the 
frequency of occurrence of a given word in a given email. The combined dataset 
may be split between Alice and Bob in two possible ways. In a data split, both 
Alice and Bob have a disjoint set of data samples with the same features. The 
aggregate dataset is obtained by concatenating columns given by the data matrix 
M = [A and correlation matrix M'^ M . In a feature split, Alice and Bob have 
different features of the same data. The aggregate data matrix M is obtained by 

^A^ 

concatenating rows given by the data matrix M = _ and correlation matrix 

B 

MM'^ . If V is an eigenvector of M'^M with a non-zero eigenvalue A, we have 

M'^Mv = Xv ^ MM^Mv = XMv. 

Therefore, 7^ is the eigenvector of MM'^ with eigenvalue A. Similarly, any 
eigenvector of horizontally split data MM'^ associated with a non-zero eigen- 
value is an eigenvector of vertically split data M'^M corresponding to the same 
eigenvalue. Hence, we mainly deal with calculating the principal eigenvector of 
the vertically split data. In practice the correlation matrix that has the smaller 
size should be used to reduce the computational cost of eigen-decomposition 
algorithms. 

For vertical data split, if Alice's data A is of size k x m and Bob's data B 
is of size k x n, the combined data matrix will be Mfcx(m+n)- The correlation 
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matrix of size (m + n) x (m + n) is given by 



A^A A^B 
B^A B^B 



3.2 The Basic Protocol 



The power iteration algorithm computes the principal eigenvector of M'^ M by 
updating and normalizing the vector xt until convergence. Starting with a ran- 
dom vector a;o, we calculate 



IM^M x^W 



For privacy, we split the vector Xi into two parts, and Pi. Ui corresponds to 
the first m components of Xi and Pi corresponds to the remaining n components. 
In each iteration, we need to securely compute 



M^Mxi 



'A^A 


A'^B' 




Cti 




'A^{Aa,+Bp,) 




'A^u- 


B^A 


B^B 








BT{Aa, + B/3,) 




B^u, 



(1) 



where Ui = Aat + BPi. After convergence, and /3i will represent shares held 
by Alice and Bob of the principal eigenvector of M'^M. 




Fig. 1. Visual description of the protocol. 



This now lays the groundwork for us to define a distributed protocol in which 
Alice and Bob work only on their portions of the data, while computing the prin- 
cipal eigenvector of the combined data in collaboration with a third party Trent. 
An iteration of the algorithm proceeds as illustrated in Fig. [ij At the outset Alice 
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and Bob randomly generate component vectors ao and /3o respectively. At the 
beginning of the z*^ iteration, Alice and Bob possess component vectors aj and 
Pi respectively. They compute the product of their data and their corresponding 
component vectors as Aai and Bpi. To compute Wj, Alice and Bob individually 
transfer these products to Trent. Trent adds the contributions from Alice and 
Bob by computing 

Ui = Aai + Bpi. 

He then transfers Ui back to Alice and Bob, who then individually compute A^Ui 
and B^Ui, without requiring data from one other. For normalization, Alice and 
Bob also need to securely compute the term 

\\M^M XiW = ^J\\A'ruiP + \\BTuiP. (2) 

Again, Alice and Bob compute the individual terms and re- 

spectively and transfer it to Trent. As earlier, Trent computes the sum 

\\A^u,\\' + \\B^u.f 

and transfers it back to Alice and Bob. Finally, Alice and Bob respectively update 
a and /3 vectors as 

Ui = Aai + BjSi, 

^ A^ 

^WATuiW^ + WB^UiW^' 

Pi+i = ^^"^ (3) 

^WA^UiP + WBTuiP 

The algorithm terminates when the a and /3 vectors converge. 



3.3 Making the Protocol More Secure 

The basic protocol described above is provably correct. After convergence, Alice 
and Bob end up with the principal eigenvector of the row space of the combined 
data, as well as concatenative shares of the column space which Trent can gather 
to compute the principal eigenvector. However the protocol is not secure; Alice 
and Bob obtain sufficient information about properties of each others' data ma- 
trices, such as their column spaces, null spaces, and correlation matrices. We 
present a scries of modifications to the basic protocol so that such information 
is not revealed. 



Homomorphic Encryption: Securing the data from Trent. The central 

objective of the protocol is to prevent Trent from learning anything about cither 
the individual data sets or the combined data other than the principal eigenvec- 
tor of the combined data. Trent receives a series of partial results of the form 
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AA^u, BB^u and MM^u. By analyzing these results, he can potentially deter- 
mine the entire column spaces of Alice and Bob as well as the combined data. 
To prevent this, we employ an additive homomorphic cryptosystem introduced 
in Section [221 

At the beginning of the protocol, Alice and Bob obtain a shared public 
key /private key pair for an additive homomorphic cryptosystem from an authen- 
ticating authority. The public key is also known to Trent who, however, does not 
know the private key; While he can encrypt data, he cannot decrypt it. Alice 
and Bob encrypt all transmissions to Trent, at the first transmission step of each 
iteration Trent receives the encrypted inputs i?[Aai] and E[Bl3i\. He multiplies 
the two element by element to compute i?[Aaj;] -ElBfii] — E[Aai-\- BjSi] = E[ui\. 
He returns E[ui\ to both Alice and Bob who decrypt it with their private key to 
obtain . In the second transmission step of each iteration, Alice and Bob send 

A^itill^] and respectively to Trent, who computes the encrypted 

sum 

E [WA^u^] ■ E W\B^u,f] = E [\\A^u,f + \\B^u,f] 

and transfers it back to Alice and Bob, who then decrypt it to obtain -|- 
which is required for normalization. 
This modification does not change the actual computation of the power iter- 
ations in any manner. Thus the procedure remains as correct as before, except 
that Trent now no longer has any access to any of the intermediate computa- 
tions. At the termination of the algorithm he can now receive the converged 
values of a and /3 from Alice and Bob, who will send it in clear text. 



Random Scaling: Securing the Column Spaces. After Alice and Bob re- 
ceive Ui = Aai + BPi from Trent, Alice can calculate Ui — Aai — Bf3i and Bob 
can calculate Ui — Bf3i = Aai. After a sufficient number of iterations, particularly 
in the early stages of the computation (when Ui has not yet converged) Alice can 
find the column space of B and Bob can find the column space of A. Similarly, 
by subtracting their share from the normalization term returned by Trent, Alice 
and Bob are able to find and respectively. 

In order to prevent this, Trent multiplies Ui with a randomly generated scal- 
ing term that he does not share with anyone. Trent computes 

iE[Aa,] ■ E[BI3,]Y^ = E[r,{Aa, + Sft)] = E[nu,] 

by performing element-wise exponentiation of the encrypted vector by rj and 
transfers riUi to Alice and Bob. By using a different value of r-i at each itera- 
tion, Trent ensures that Alice and Bob are not able to calculate Bfii and Aui 
respectively. In the second step, Trent scales the normalization constant by r?, 

[E W\A^uA\'] ■ E [\\B^uA\']f = E [rj {Ujuf + H^Mp)] . 
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Normalization causes the factor to cancel out and the update rules remain 
unchanged. 

Ui = Aui + B^i, 

Q!j+1 = 



o _ TjE'^Ui _ B'^Uj 

~ ^/rUWu^\\^ + \\BTuir) ~ ./W^^W+W^' 

The random scaling does not affect the final outcome of the computation, and 
the algorithm remains correct as before. 



Data Padding: Securing null spaces. In each iteration, Alice observes one 
vector riUi = ri(Aai + Bj3i) in the column space of M = [A B]. Alice can 
calculate the null space H{A) of A, given by 

H{A) = {x& W\Ax = 0} 

and pre-multiply a non-zero vector x G H{A) with TjUj to calculate 

xnui = rix{Aai + B^i) = nxB^i. 

This is a projection of Bj3i, a vector in the column space of B into the null space 
H{A). Similarly, Bob can find projections of Aai in the null space H{B). While 
considering the projected vectors separately will not give away much information, 
after several iterations Alice will have a projection of the column space of B on 
the null space of A, thereby learning about the component's of Bob's data that 
lie in her null space. Bob can similarly learn about the component's of Alice's 
data that lie in his null space. 

In order to prevent this, Alice participates in the protocol with a padded 
matrix \A Pa] as input created by concatenating her data matrix A with a 
random matrix = Valkxk^ where is a positive scalar chosen by Alice. 
Similarly, Bob uses a padded matrix [P Pt,] created by concatenating his data 
matrix B with Pf, = rblkxk, where ri, is a different positive scalar chosen by Bob. 
This has the effect of hiding the null spaces in both their data sets. The following 
lemma shows that the eigenvectors of the combined data do not change when 
using padded matrices. Please refer to appendix for the proof. 



Lemma 1. Let M = [M P] where M is a s x t matrix, and P 



IS a s X s 



Vtxl 



is an eigenvector of M^M corresponding to 



orthogonal matrix. If v 
an eigenvalue X, then v is an eigenvector of M^M. 

While the random factors and rt prevent Alice and Bob from estimating the 
eigenvalues of the data, the computation of principal eigenvector remains correct 
as before. 
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Obfuscation: Securing Krylov spaces. For a constant c, we can show that 
the vector Ui — Aai + Bfii is equal to cMAl'^Ui^i. The sequence of vectors 
U = {ui, U2, U3, . . .} form the Krylov subspace (MM-^)"ui of the matrix 
MM'^ . Knowledge of this series of vectors can reveal all eigenvectors of MM"^ . 
Consider uq = civi + C2V2 + • • • , where Vi is the i"^ eigenvector. If Xj is the 
j*'^ eigenvalue, we have Ui — ciAiWi + C2A2W2 + • • • • We assume wlog that the 
eigenvalues A are in a descending order, i.e., Xj > Xk for j < k. Let Uconv be 
the normalized converged value of Uj which is equal to the normalized principal 
eigenvector vi. 

Let Wi = Ui ~ {ui ■ Uconv)ui which can be shown to be equal to C2A2W2 + 
C3A3W3 + • • • , i.e., a vector with no component along vi. If we perform power 
iterations with initial vector wi , the converged vector Wconv will be equal to the 
eigenvector corresponding to the second largest eigenvalue. Hence, once Alice 
has the converged value, Uconv , she can subtract it out of all the stored Ui values 
and determine the second principal eigenvector of MM'^ . She can repeat the 
process iteratively to obtain all eigenvectors of MM'^ , although in practice the 
estimates become noisy very quickly. As we will show in Section [4] the following 
modification prevents Alice and Bob from identifying the Krylov space with any 
certainty and they are thereby unable to compute the additional eigenvectors of 
the combined data. 

We introduce a form of obfuscation; we assume that Trent stores the en- 
crypted results of intermediate steps at every iteration. After computing E[riUi], 
Trent either sends this quantity to Alice and Bob with a probability p or sends 
a random vector E[u'^ of the same size (fc x 1) with probability 1 — p. As the 
encryption key of the cryptosystem is publicly known, Trent can encrypt the 
vector u'i- Alice and Bob do not know whether they are receiving riUi or u'^. If a 
random vector is sent, Trent continues with the protocol, but ignores the terms 
Alice and Bob return in the next iteration, E[Aai+i\ and E[Bl5i+i\. Instead, he 
sends the result of a the last non-random iteration j, E[rjUj\, thereby restarting 
that iteration. 

This sequence of data sent by Trent is an example of a Bernoulli Process [10] . 
An illustrative example of the protocol is shown in Fig. [2j In the first two iter- 
ations, Trent sends valid vectors riUi and r2U2 back to Alice and Bob. In the 
beginning of the third iteration, Trent receives and computes E[r'iU^] but sends 
a random vector Ug. He ignores what Alice and Bob send him in the fourth iter- 
ation and sends back £'[r3U3] instead. Trent then stores the vector i?[r4U4] sent 
by Alice and Bob in the fifth iteration and sends a random vector Uj- Similarly, 
he ignores the computed vector of the sixth iteration and sends Ug. Finally, he 
ignores the computed vector of the seventh iteration and sends E[r/iU4\. 

This modification has two effects - firstly it prevents Alice and Bob from 
identifying the Krylov space with certainty. As a result, they are now unable to 
obtain additional eigenvectors from the data. Secondly, the protocol essentially 
obfuscates the projection of the column space of B on to the null space of A for 
Alice, and analogously for Bob by introducing random vectors. As Alice and Bob 
do not know which vectors are random, they cannot completely calculate the true 
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Fig. 2. An example of the protocol execution with obfuscation. 

projection of each others data on the null spaces. This is rendered less important 

if Alice and Bob pad their data as suggested in the previous subsection. 

Alice and Bob can store the vectors they receive from Trent in each itera- 
tion. By analyzing the distribution of the normalized vectors, Alice and Bob can 
identify the random vectors using a simple outlier detection technique. To pre- 
vent this, one possible solution is for Trent to pick a previously computed value 
of rjUj and add zero mean noise Cj, for instance, sampled from the Gaussian 
distribution. 

u'i = TjUj + Ci, Ci ^ N{Q, cr"^). 

Instead of transmitting a perturbation of a previous vector, Trent can also use 
perturbed mean of a few previous rjUj with noise. Doing this will create a 
random vector with the same distributional properties as the real vectors. The 
noise variance parameter a controls the error in identifying the random vector 
from the valid vectors and how much error do we want to introduce in the 
projected column space. 

obfuscation has the effect of increasing the total computation as every itera- 
tion in which Trent sends a random vector is wasted. In any secure multi-party 
computation, there is an inherent trade-off between computation time and the 
degree of security. The parameter p which is the probability of Trent sending a 
non-random vector allows us to control this at a fine level based on the appli- 
cation requirements. As before, introducing obfuscation does not affect the cor- 
rectness of the computation - it does not modify the values of the non-random 
vectors Mj. 

3.4 Extension to Multiple Parties 

As we mentioned before, the protocol can be naturally extended to multiple 
parties. Let us consider the case of N parties: Pi , . . . , Pjv each having data 
Ai, . . . , An of sizes k x ni,...,fc x rijv respectively. The parties are interested 
in computing the principal eigenvector of the combined data without disclosing 
anything about their data. We make the same assumption about the parties 
and the arbitrator Trent being semi-honest. All the parties except Trent share 
the decryption key to the additive homomorphic encryption scheme and the 
encryption key is public. 
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In case of a data split, for the combined data matrix M = [Ai A2 
the correlation matrix is 



■■An], 



AlAi ■ ■ ■ AJA 



N 



A%Ai ■ ■ ■ AJjAn^ 



We split the eigenvector into N parts, ai,...,aAr of size ni,...,nN respec- 
tively, each corresponding to one party. For simplicity, we describe the basic 
protocol with homomorphic encryption; randomization and obfuscation can be 
easily added by making the same modifications as we saw in Sections 3.3. One 
iteration of the protocol starts with the i*^ party computing A^ai and transfer- 
ring to Trent the encrypted vector E[Aiai]. Trent receives this from each party 
and computes 



E 



E[u] 



where u — ^^A^ai, and product is an element-wise operation. Trent sends 
the encrypted vector E[u\ back to Pi, ... , Pn who decrypt it and individually 
compute Afu. The parties individually compute ||A^m|P and send its encrypted 
value to Trent. Trent receives N encrypted scalars E and calculates 

the normalization term 



Y[E[\\Afur]^E 



and sends it back to the parties. At the end of the iteration, the party Pi updates 
a,: as 



= A, a 



(old) 
i ' 



a. 



{new) 



Afu 



The algorithm terminates when any one party Pi converges on ai 



(5) 



4 Analysis 



4.1 Correctness 



The protocol outlined in Section 3.2 is provably correct. The steps introduced in 



Section 3.3 do not modify the operation and hence the accuracy of the protocol 



m any manner. 
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4.2 Security 



As a consequence of the procedures introduced in Section 3.3 the row spaces 
and null spaces of the parties are hidden from each another. In the multiparty 
scenario, the protocol is also robust to collusion between parties with data, 
although not to collusion between Trent and any of the other parties. If two 
parties out of N collude, they will find information about each other, but will 
not learn anything about the data of the remaining N — 2 parties. 

What remains is the information which can be obtained from the sequence 
of Ui vectors. Alice receives the following two sets of matrices: 

U = {ui,U2,U3,...}, U' = {u^,U2,-.-} 

representing the outcomes of valid iterations and the random vectors respectively. 
In the absence of the random data U' , Alice only receives U . As mentioned in 
Section 3.3 = {MM'^Yuq which is a sequence of vectors from the Krylov 



space of the matrix AA^ + BB^ sufficient to determine all eigenvectors of MM'^ . 
For fc-dimensional data, it is sufficient to have any sequence of k vectors in U 
to determine MM'^ . Hence, if the vectors in U were not interspersed with the 
vectors in [/', the algorithm essentially reveals information about all eigenvectors 
to all parties. Furthermore, given a sequence u^, u^+i, ■ • • , ^i+fe^i vectors 
from U, Alice can verify that they are indeed from the Krylov spacej^Introducing 
random scaling r^Ui makes it harder still to verify Krylov space. While solving 
for k vectors, Alice and Bob need to solve for another k parameters ri, . . . , r^. 

Security is obtained from the following observation: although Alice can verify 
that a given set of vectors forms a sequence in the Krylov space, she cannot select 
them from a larger set without exhaustive evaluation of all k sets of vectors. If 
the shortest sequence of k vectors from the Krylov space is embedded in a longer 
sequence of N vectors, Alice needs (^) checks to find the Krylov space, which 
is a combinatorial problem. 



4.3 Efficiency 

First we analyze the computational time complexity of the protocol. As the total 
the number of iterations is data dependent and proportional to ^ , we analyze 
the cost per iteration. The computation is performed by the individual parties 
in parallel, though synchronized and the parties also spend time waiting for 
intermediate results from other parties. Obfuscation introduces extra iterations 
with random data, on average the number of iterations needed for convergence 
increase by a factor of ^, where p is the probability of Trent sending a non- 
random vector. As the same operations are performed in an iteration with a 
random vector, its the time complexity would be the same as an iteration with 
a non-random vector. 

In the z**^ iteration, Alice and Bob individually need to perform two matrix 
multiplications: Aai and A'^{Aai + B(3i), Bj3i and B"^ {Aai + Bfii) respectively 



^ if the spectral radius of MM"^ is 1. 
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The first part involves multiplication of a fc x m matrix by a to dimensional 
vector which is 0{km) operations for Alice and 0{kn) for Bob. The second part 
involves multiplication of a m x fc matrix by a fc dimensional vector which is 
0{km) operations for Alice and 0{kn) for Bob. Calculating ||yl-'"(AQ;i + B/3i)\\'^ 
involves 0{m) operations for Alice and analogously 0{n) operations for Bob. 
The final step involves only a normalization by a scalar and can be again done 
in linear time, 0{m) for Alice and 0{n) for Bob. Therefore, total time complexity 
of computations performed by Alice and Bob is 0{km) + 0{m) = 0{km) and 
0{kn) +0(n) = 0{kn) operations respectively. Trent computes an element-wise 
product of two fc dimensional vectors Aai and B^i which is 0{k) operations. 
The multiplication of two encrypted scalar requires only one operation, making 
Trent's total time complexity 0(fc). 

In each iteration, Alice and Bob encrypt and decrypt two vectors and two 
scalar normalization terms which is equivalent to performing fc + 1 encryptions 
and fc + 1 decryptions individually, which is 0(fc) encryptions and decryptions. 

In the i*'* iteration, Alice and Bob each need to transmit fc dimensional 
vectors to Trent who computes E{Aai+B/3i) and transmits it back: involving the 
transfer of 4fc elements. Similarly, Alice and Bob each transmit one scalar norm 
value to Trent who sends back another scalar value involving in all the transfer 
of 4 elements. In total, each iteration requires the transmission of 4fc + 4 = 0{k) 
data elements. 

To summarize, the time complexity of the protocol per iteration is 0{km) 
or 0{kn) operations whichever is larger, 0(fc) encryptions and decryptions, and 
0(fc) transmissions. In practice, each individual encryption/decryption and data 
transmission take much longer than performing computation operation. 



5 Conclusion 



In this paper, we proposed a protocol for computing the principal eigenvector 
of the combined data shared by multiple parties coordinated by a semi-honest 
arbitrator Trent. The data matrices belonging to individual parties and corre- 
lation matrix of the combined data is protected and cannot be reconstructed. 
We used randomization, data padding, and obfuscation to hide the information 
which the parties can learn from the intermediate results. The computational 
cost for each party is 0{km) where fc is the number of features and m data 
instances along with 0(fc) encryption and decryption operations and 0{k) data 
transfer operations. 

Potential future work include extending the protocol to finding the com- 
plete singular value decomposition, particularly with efficient algorithms like 
thin SVD [T]. Some of the techniques such as data padding and obfuscation 
can be applied to other problems as well. We are working towards a unified 
theoretical model for applying and analyzing these techniques in general. 
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6 Appendix 

Proof (Lemma^. Wc have, 

M'^M = 

Multiplying by the eigenvector v = 
Therefore, 



'M'^M M'^P 
P^M I 



Vtxl 



gives us 



V 




'M^Mv + M'^Pv'' 


y. 




P'^Mv + v' 



= X 


V 







M^Mv + M^Pv' = Xv, 
P^Mv + v' = Xv'. 



(6) 
(7) 



Since X ^ 1, Equation ([t]) implies v' = j^P^ Mv. Substituting this into 
Equation ^ and the orthogonality of P gives us 



Mv 



X-l 



-M'^PP^Mv 



X-l 



NP Mv ^ Xv. 



Hence, M'^Mv = (A - l)v. 



□ 



